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[57] ABSTRACT 

A style of document format known as compound docu- 
ment architecture is known, in which a document is 
broken up into a tree of objects or segments (e.g. docu- 
ment: chapter: subtitle: para: para: etc.), possible with a 
second layout tree. Two styles of such architecture are 
ODA and CD A. Conversion from CDA to ODA pres- 
ents difficulties, as in CDA, a segment can contain e.g. 
text and graphic elements, while ODA has stricter for- 
matting rules. One of a plurality of DAPs (Document 
Application Profiles) 12 is selected, depending on 
which subset of full ODA is being used. The DAP 
contains a structure converter component which starts 
to construct the objects of the ODA document. When 
an information element is reached (text, graphics, etc.), 
it is sent to the appropriate one of a set of content han- 
dlers 13. These call back to callback units (text, graph- 
ics, footnote, etc.) in the DAP when an ODA logical 
object is to be completed, and on each other (possible 
recursively, as for footnotes in text) when a change of 
information element type is reached. 

14 Claims, 6 Drawing Sheets 
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DATA FORMAT CONVERSION 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates to the conversion of 
data between different formats, and more particularly to 
such conversion between different forms of compound 
document architecture. 

2. Data Architectures — General 

In word processing, desk-top publishing, and the like, 
a variety of data formats have been developed. The 
simplest formats are designed to deal only with textual 
matter. Even with such an apparently simple situation, 
there are many aspects which may or may not be dealt 
with, and which, if they are dealt with, may be dealt 
with in a variety of ways. These include, for example, 
word wrapping, right justification, headers and footers, 



10 



15 



The discussion so far has largely assumed that the 
document is fully formatted. However, it is often con- 
venient to separate the informational contents of the 
document from its final formatting. The document in its 
initial state is in what is termed processable form. (The 
term "informational contents" is to be taken in the 
broad sense of including the general parameters of the 
document such as page headers and footers, typeface, 
paragraph insets, narrowed margins, etc. are defined.) 

With a processable document, the details of the actual 
page layout (primarily page width or line length and 
page length) are left undefined. When the document is 
finally printed as a hard copy, the printing system has to 
have such formatting information supplied to it, and the 
system has to calculate the positions of line and page 
endings and make appropriate adjustments (e.g. in page 
numbering and footnote positioning). 

A major advantage of using documents in processable 
form is that editing of the document is simplified, for 



and margin control More advanced formats include 2Q tWQ reasons Qne h ^ the ^ of the documem 



30 



provision for things like footnotes and paragraph and 
section numbering, tabular information, geometric 
graphics, and bit image graphics. 

As the nature of such formats becomes increasingly 
complex, it becomes convenient to distinguish between 25 
the general principles or rules of a format and the details 
of the implementation of such a set of principles. The set 
of principles is commonly termed an architecture, and 
an implementation is often termed an interchange for- 
mat of that architecture. 

With a simple architecture, such as that provided by 
a basic word processing system, the structural features 
of a document formatted under that architecture are 
very simple. With such an architecture, the document 
will normally be formatted as a stream of alphanumeric 35 
characters — the text — with whatever structural features 
it has being embedded within that stream. That is, fea- 
tures such as line returns will be represented by control 
codes occurring at the appropriate points within the 
character stream. Similarly, such matters as the starting 40 
and stopping of italics and bold-face can be represented 
by the inclusion of control characters in the character 
stream. This style of architecture can obviously be ex- 
tended to more complicated situations. For example, if 
the architecture includes pagination, the codes for page 45 
endings can obviously be included in the character 
stream. 

The appropriate information regarding such matters 
as line and page length must obviously be specified. In 



(usually on a word processor) does not have to take 
account of the details of the layout of the document on 
the page. This means that the operation of the word 
processor is faster, particularly in situations where the 
operator jumps between widely separated locations in 
the document. (If the document is maintained in fully 
formatted form, the system would have to reformat the 
whole of the text between the relevant locations in 
moving forward from one to the other). 

The other reason is that the document can be trans- 
ferred between different systems much more easily. 
Such different systems may involve different hardware, 
or something apparently as minor as a slight change of 
printer character style (which will require recalculation 
of line lengths), or perhaps simply a change in the size of 
paper. Again, if the document were fully formatted, 
such a change would involve reformatting the whole 
document, whereas with the document in processable 
form, no change is required to the document 

It will be assumed from here on that documents are in 
processable form. 

Compound Document Architecture — General 

The formatting technique discussed above consists 
essentially of incorporating the formatting information 
in the character stream of the document. With a very 
simple word processing system, such information may 
consist for example of little more than paragraphing, 
and this technique is generally satisfactory. However, in 



very simple cases, this can sometimes be dealt with by 50 more complicated situations, this technique can become 



the operation of whatever printer the document is sent 
to. However, it is more usual for such information to be 
included within the document. If the line and page 
lengths are unchanged throughout the document, then 



unwieldy. 

A simple example can arise in the situation mentioned 
above, where there are quotations in the text. If there 
are several such quotations, there will be a correspond- 



this information will normally be included at the begin- 55 ing number of repetitions of the control blocks. Another 



ning of the document, preceding the text, and forming a 
header block. 

This technique naturally extends to the provision of 
similar control blocks at appropriate points within the 



situation is where the document is large, consisting of a 
number of chapters each subdivided into sections. 

Further, there is often a need for systems which are 
more elaborate that simple word processing systems. 



information stream where the format (in the sense of 60 The possibilities of such things as typeface changes 



such matters as margins and page length) of the docu- 
ment changes. A simple example is the inclusion of a 
quotation in the form of a distinct paragraph using a 
smaller typeface than and with its margins inset from 
the main text. This will be preceded by a control block 65 
setting the typeface and margins, and will naturally be 
followed by another control block setting the original 
typeface and margins. 



have already been mentioned, and such systems can be 
developed further to permit the inclusion of, for exam- 
ple, geometric and/or raster graphics and various kinds 
of lists and tables. 

For these and other reasons, an alternative approach 
has been developed as the architecture becomes more 
complicated, giving a second style of architecture, ge- 
netically termed compound document architecture. 
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This second style involves the separation of the struc- 
ture of a document from its contents. 

With this approach, the logical structure of a docu- 
ment is defined in the form of a tree, which can most 
easily be explained by means of an example. Suppose 5 
the document is a textbook. Then the first level of struc- 
ture of the document may be defined as consisting of a 
title, a list of contents, an introduction, a number of 
chapters, a list of references, and an index, with the 
number of chapters being variable and, say, the pres- 10 
ence of an introduction being optional. In turn, each of 
these elements has its own structure; for example, the 
structure of a chapter may consist of a title and a num- 
ber of sections each with a section heading. An element 
with such a structure of its own is termed a compound 15 
element; an element with no such subordinate structure 
is termed a basic element. 

With this second, more formalized and structured, 
architectural style, the format or layout of the docu- 
ment is a separate and to some extent subsidiary matter 20 
which has to be superimposed on the logical structure. 
This is done by defining a layout structure for each 
logical element. In principle, this could be achieved by 
defining a set of independent layout structures, but since ^ 
the desired layouts of the various logical elements will 
usually have many features in common, this is achieved 
in practice by defining a tree of layout elements. 

The logical structure will thus consist of a tree of 
elements, with the final or "leaf elements containing 3Q 
the actual data (text or other information). The "leaf* 
elements of the layout structure or tree will be associ- 
ated with the "leaf elements of the logical structure. 
Minor local features of the layout, such as boldface and 
italics, are embedded in the character string of the logi- 35 
cal element by the layout process. 

As with the first architectural style, the compound 
document style can be tailored to produce either a pro- 
cessable or a fully formatted document. There is a fur- 
ther complication with the fully formatted form, since ^ 
the logical elements as so far described are independent 
of page breaks. What is usually done is that logical 
element crossing a page break is redefined as two sepa- 
rate logical elements, one on each page. (These two 
logical elements naturally share a common logical par- 45 
ent.) We shall here, however, assume as before that we 
are concerned primarily with documents in processable 
form. 

A different issue has to be addressed if complicated 
information is to be dealt with. Information may, as 50 
assumed so far, be in the form of text; however, it may 
also be desirable to deal with information in other 
forms, such as geometric graphics, raster (bit image) 
graphics, tables, spreadsheets, and so on. (Geometrical 
graphics are diagrams defined in terms of lines whose 55 
end-point coordinates are given, circles whose center 
coordinates and radii are given, fill patterns, etc; bit 
image graphics are diagrams defined by the states of 
their individual pixels.) These will in general need their 
own formats to be defined largely independently of the 60 
formats of other types of information, and these formats 
will be included as control blocks or layout elements 
depending on the architectural style. 

In general, blocks of information of different types 
may follow each other and/or be embedded in each 65 
other fairly freely in the first architectural style. In the 
second style, a basic logical element must be of a single 
information type, but a compound logical element can 



4 

include basic logical elements of different information 
types. 

A special situation arises with footnotes, which have 
to be treated to some extent as if they are of a different 
information type although they are usually text occur- 
ring in text. With the second architectural style, a foot- 
note is dealt with by means of a separate logical ele- 
ment. (With the first style, a footnote is embedded in the 
character stream by means of a suitable control clock.) 

There is of course a vast variety of specific data for- 
mats, and there is often a need to convert data from one 
format to another. Such conversion can rarely be per- 
formed perfectly even if the two formats use the same 
architectural style. The present invention is concerned 
with the conversion between certain different imple- 
mentations or architectures of the second style. 

CDA and ODA Compound Document Architectures 

Several compound document architectures currently 
exist, e.g. CDA and ODA (note that the word "com- 
pound" is used in two different senses, one generic and 
the other specific (CDA)). Each of these defines how 
compound documents can be represented and stored. 
More specifically, CDA is Compound Document Ar- 
chitecture and ODA is Open Document Architecture. 
These have associated with them two respective spe- 
cific formats, DDIF (Digital Document Interchange 
Format) and ODIF (Office Document Interchange 
Format) respectively. The ODA/ODIF architecture 
and format are defined by ISO document ISO 8613, and 
the DDIF architecture and format are defined by docu- 
ment DEC STD 078 of Digital Equipment Corpora- 
tion. CDA and ODA are architectures, and DDIF and 
ODIF are particular formats which conform with those 
respective architectures. 

Both these architectures and their associated formats 
are widely used, and there is an obvious need to be able 
to convert data in either format into the other. How* 
ever, these two architectures and formats differ signifi- 
cantly. The conversion between them is therefore not 
trivial. 

The present invention will be described with refer- 
ence to CDAVDDIF and ODA/ODIF. It should how- 
ever be understood that it is not limited to any particu- 
lar features of those particular architectures, and also 
that many of the characteristics of those architectures 
are described in modified and/or simplified form for 
present purposes. 

The details of the structures of documents in 
ODA/ODIF and CDA/DDIF are defined by the docu- 
ments noted above. However, to understand the present 
invention, a brief informal description of these struc- 
tures is desirable. 

Both structures utilize a hierarchy of elements (gener- 
ally termed "objects" in ODA/ODIF and "segments" 
in CDA/DDIF) arranged in tree form, the tree starting 
with a root element. For example, if the document is a 
book, the root element will be the entire document (the 
book); the elements in the first level down will be the 
chapters; the elements in each chapter may be a title and 
a number of sections; the elements in a section may be a 
section heading a number of paragraphs; and so on. 
Each element will in general have a number of attri- 
butes, each of which can have one of a range of possible 
values. 

In ODA/ODIF, the elements just mentioned form 
the logical structure. The root element is termed a root 
logical object. At the "leaf ends of the branches, the 
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lowermost elements are termed "contents portions"; In addition to the obviously different types of con- 

these contain the content or material of the document. tents (text, geometric graphics, raster graphics, list 

The elements between the root element and the content structures, and so on), there is another situation in 

portion elements are termed "basic logical objects" which different pieces of contents need to be treated to 

(BLOs)ifthey contain only contents portions, or "com- 5 some extent as different types, even though they are 

pound logical objects" (CLOs) otherwise (i.e. if they both textual. This situation is that of footnotes. In 

contain further logical objects). CDA/DDIF, a footnote can be included (embedded) in 

In ODA/ODIF, the layout of a document is de- the middle of a textual segment. In ODA/ODIF, a 

scribed by a layout structure which is also in the form of footnote has to have its own object (or, in fact, its own 

a tree. The layout structure has a root element, the 10 object tree, as will be described later). Somewhat simi- 

document. The elements one level down are page sets; lar situations arise with things such as the insertion of 

the elements the next level down are pages; these may page numbers in the form of cross references in the 

be followed by frames (optionally several levels body of the document (as distinct from page numbers in 

thereof); and these by blocks (forming the lowest level). headers or footers, which are not involved in docu- 

This layout tree is independent of the logical structure, 15 me nts in processable form). 

though it will usually have some general similarities For the operation of the system to be explained, it is 
with that logical structure. For example, if the docu- desirable to outline the nature of ODA/ODIF and 
ment is a book, then the logical structure of a chapter CDA/DDIF in rather more detail, 
element being composed of a heading element and sec- 
tion elements will usually be reflected in a layout struc- 20 ODA Document Structure 
ture of a heading layout element and a section layout Considering first ODA/ODIF, an ODIF document is 
element, but most or all of the logical section elements structured as a pair of trees, a logical tree for the con- 
will share a single common section layout element. tents and a layout tree. The logical tree ends in blocks 

In ODA/ODIF, these two trees for a document must w hich correspond one-to-one with "contents" objects 

be constructed in such a way that each content portion 25 (basj c logical objects, BLOs) and which are the actual 

of the logical structure is paired with a corresponding contents of the BLOs. The two trees meet at the "leaf* 

block in the layout structure. As noted above, this will i eve ] t w j tn each BLO (or block) of the logical tree being 

often involve splitting a paragraph between two col- . a i so a f ma ] or "leaf element of the layout tree. A single 

umns or pages. "contents" object can contain contents of only one 

It will be realized that when a document is being 30 type— text, geometric graphics, bit image graphics, etc. 

worked on (created or edited), the full constraints are (Hence a diagram will typically be defined by a "dia- 

not normally observed. This is because, as is well recog- gram " object as consisting of a geometric graphics ob- 

nized, the editing of a document is slowed and compli- j ect , a title, and optionally one or more labels to go in 

cated considerably if the document is reformatted very the diagram.) 

time a small change is made to its contents. Instead, the 35 Th e objects of the logical tree, as discussed above, 

full constraints are imposed on the document when define the logical structure of the document, and each 

editing ends. (This principle applies equally to ODA/O- object constrains the number, type, and arrangement of 

DIF and CDA/DDIF.) the objects connected directly below it. The layout 

In CDA/DDIF, there are logical and layout tree objects define the layout of the document, and each 

structures somewhat similar to those of ODA/ODIF. 40 may contain attributes. Attributes are inherited by de- 

However, the structure of CDA/DDIF is less rigidly f au i t; thus if a object does not declare values for some 

constrained and defined than that of ODA/ODIF, attributes, those attributes have the values declared in 

which is more rigidly formalized and structured. the object above it in the tree. Logical objects may also 

To some extent, CDA/DDIF uses an implicit galley- con tain attributes, which over-ride the attributes of the 

based layout (though the attributes of the various ele- 45 layout objects which would otherwise apply. Each final 

ments will generally define many of the characteristics layout object defines the layout and presentation of the 

or features of the layout—for example, fonts, font sizes, contents of the logical objects coupled to it. 

line widths, &c). CDA/DDIF is thus simpler than The general structure of an ODIF can thus be shown 

ODA/ODIF in some ways; however, it is more elabo- ^ a double tree, FIG. 1 being a simplified example. The 

rate in others (for example, it allows the use of live links 50 document can alternatively be represented in tabu- 

and external references). l ar f orm , ^ shown in Table I. This is broadly the form 

To a considerable extent, individual elements in in which the document is stored in memory. The tabular 

ODA/ODIF and CDA/DDIF correspond directly to structure of the document in this form is present in a 

each other on a one-to-one basis. Among these elements relatively explicit manner, 
may be, for example, elements, attribute names, attri- 55 

bute values, text strings, and so on. AoLh. 1 

However, the differences between these two archi- Logical object 1 

tectures means that this correspondence is not exact un^^t wed b-ob ts 

Thus CDA/DDIF allows what is in effect a nesting of Logical objecTi.i U J 

segments, which results in the segmentation being 60 "Tabic" 

somewhat less clear-cut than in ODA/ODIF. Further, • Lasting of permitted sub-objects 

in this nesting in CDA/DDIF, a contents portion of one ^Scto^"* 

type (e.g. textual) can be followed by a contents portion , Listing of permitted sub-objects 

of another type (e.g. graphical), whereas in ODA/O- ••• 

DIF, contents portions of different types must be in 65 Logical object 1.1.1 

distinct (logical) objects. (The different types of con- Lo*"caTob ect 1 2 1 

tents— e.g. text, raster graphics, and geometric graphic- -Subtitle*' J 

s— may also be termed metaclasses.) Listing of permitted sub-objects 
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TABLE I-continued 



8 

TABLE II-continued 



Logical object 1.2.2 
"Section" 

Listing of permitted sub-objects 
Logical object 1.2.3 
"Section" 

Listing of permitted sub-objects 
••• 

Logical object 1.2.1 
"Content" 
Block 1.2.1 

Layout - layout object 1.1.1 
Logical object 1.2.1.1 
"Content" 
Block 1.2.1.1 

Layout - layout object 1.1.2 
Logical object 1.2.1.2 
"Content" 
Block 1.2.1.2 

Layout - layout object 1.1.2 

Layout object 1 

"Document" 

Attributes 

Listing of permitted sub-objects 
Layout object 1.1 
"Title page" 
Attributes 

Listing of permitted sub-objects 

Layout object 1.2 

"Page" 

Attributes 

Listing of permitted sub-objects 
••• 

Layout object 1.1.1 
'Title page frame" 
Attributes 
Layout object 1.1.2 
"Frame" 
Attributes 



10 



15 



20 



25 



30 



35 



CDA Document Structure 

Considering now CDA/DDIF, a DDIF document is 
also generally structured as two trees of segments, a ^ 
logical (or "contents") tree and a layout tree. As with 
ODA/ODIF, the logical segments may contain layout 
or formatting information or attributes as well as con- 
tents, and it is sufficient for present purposes to consider 
primarily the logical segments. Each segment may con- 45 
tain attributes, and also primitive contents elements. If a 
segment contains primitive contents elements — that is, 
informational contents — then it also includes an indica- 
tion of the type of the contents — that is, whether the 
contents are text, geometric graphics, bit image graph- 59 
ics, etc. 

The general structure of a DDIF document can thus 
be shown as a tree, FIG. 2 being a simplified example. 
The same document can alternatively be represented in 
tabular form, as shown in Table II. This is broadly the 55 
form in which the document is stored in memory. (The 
indentations are of course provided only to clarify its 
logical structure.) The tree structure is present only in 
implicit form. 

TABLE II 60 



Segment A 
attributes 
[ 



Segment B 
attributes 
[ 

1 



65 



iC 
attributes 
I 

1 
I 



primitive contents element X 2 
primitive contents element X 3 

primitive contents element X 4 



Segment E 
computed contents 



primitive contents element X 5 
1 



Segment D 
attributes 
I 



primitive contents 
element X 6 



primitive contents 
element X 7 



1 



primitive contents element X 1 



It will be realized, of course, that there are many 
features of ODA/ODIF and CDA/DDIF which are 
not described here. For example, a document also has a 
document description (which identifies the version of 
DDIF being used and the software which created the 
document), and a document header (which contains, 
e.g. title, author, version number, data, etc). The docu- 
ment content may also include further features, such as 
the "computed contents" segment E of FIG. 2, which 
provides contents which are copied from some outside 
source (elsewhere in the document, or from some other 
document). A reference in the text to the page number 
of another part of the document is one example of this, 
and the numbering of footnotes is another. 

It will be realized that when a document is in ODIF 
or DDIF form, it is in the form of a stream of data (some 
of which is control data.) Although the full tree struc- 
tures, for example, are of course present, these tree 
structures are implicit and can only be recovered from 
the stream of data by suitable processing. Mechanisms 
termed toolkits have been developed, for both ODA/O- 
DIF and CDA/DDIF, which can analyze a document 
in ODIF or DDIF form to allow it to be manipulated 
much more readily then would be required if the string 
itself had to be analyzed for each operation. Such a 
toolkit can manipulate the attributes of a document, but 
cannot manipulate the document itself (its "contents"). 

It will be realized, of course, that the conversion of a 
document from one format to another will normally 
result in a loss of finer details of the arrangement of the 
document. For example, the precise fonts and range of 
character sizes may be different in the two systems. A 
more extreme example is if the source document con- 
tains a graphic element (a drawing or diagram) with 
associated text elements (labels for parts of the draw- 
ing). The conversion is most unlikely to be able to auto- 
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raatically locate the converted text elements in the cor- 
rect positions in the converted graphic element; and 
rather than attempt that task, the conversion will nor- 
mally merely convert the text elements into a simple list 
above or below the graphic element. Such loss of fme 5 
details is normally tolerable; it is however generally 
regarded as essential that no information content (as 
opposed to its form of presentation) should be lost in the 
conversion. 

The general object of the invention is to provide an 10 
improved means of converting documents in a CDA- 
lifce form into an ODA-like form. 

SUMMARY OF THE INVENTION 

According to one aspect, the present invention pro- IS 
vides a data structure format conversion system com- 
prising a plurality of profile conversion components for 
converting the logical and/or layout structures of the 
source document to those of the target format; a plural- 
ity of content architecture conversion components sc- 20 
lected by the profile conversion components for con- 
verting the contents of the source document to the 
target format and being invoked by the required content 
architecture conversion component; and a main con- 
verter component for identifying the profile to which 25 
the compound document conforms and invoking the 
required profile conversion component. 

This aspect of the present conversion system thus 
provides for the conversion of compound documents 
where a well-defined subset of the entire compound 30 
document architecture is used to represent a particular 
set of compound documents. In this case, the subset of 
the entire compound document architecture is termed a 
"profile". A profile therefore identifies what parts of 
the entire compound document architecture are al- 35 
lowed to be used to represent this subset of documents. 
It identifies the attributes (and their values) that may be 
defined within compound documents which conform to 
the profile. It may also define constraints on the com- 
plexity of the logical and/or layout structures defined 40 
by those compound documents. 

This architecture allows the system to be designed so 
that further profile and/or content architecture conver- 
sion components can readily be added without requir- 
ing modification of the existing components of the sys- 45 
tern. 

According to another aspect, the present invention 
provides a data structure format conversion system 
comprising a structure architecture conversion compo- 
nent for generating logical objects of the target docu- 50 
ment architecture, comprising a structure converter 
unit for generating compound logical objects of the 
target document and a plurality of callback units, each 
for generating a different type of basic logical object or 
set of logical objects including at least one basic logical 55 
object; a plurality of content architecture conversion 
components each for converting a different one of the 
possible data types of the content of the source docu- 
ment to the target format, each being callable from the 
others and from the units of the structure architecture 60 
conversion component, and each capable of calling 
back to any one of a corresponding set of the callback 
units; so as to effect conversion of intermixed content 
types in the source document. 

It is possible for content of different types to be inter- 65 
mixed within a compound document. This aspect of the 
present system provides a system of callbacks to handle 
this situation. When content of a first type is found to 



have content of a second type embedded in it, the con- 
version component for the first type will be unable to 
convert the content of the second type. It will therefore 
perform a callback to the profile conversion compo- 
nent. In response, the profile conversion component 
invokes the content. con version component for the sec- 
ond type of content, and this content conversion com- 
ponent "then performs the necessary conversion and, 
when this conversion is completed, returns to the pro- 
file conversion component. The profile conversion 
component recognizes that callback processing was 
being performed and therefore returns processing to the 
first content type's conversion component. 

Other objects, features and advantages of the inven- 
tion will become apparent from a reading of the specifi- 
cation when taken in conjunction with the drawings in 
which like reference numerals refer to like elements in 
the several views. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIGS. 1 and 2 are block diagrams illustrating the 
general structure of an ODA document and a CDA 
document respectively; 

FIG. 3 is a general block diagram of the converter 
according to the present invention; 

FIG. 4 is a more detailed block diagram of parts of 
the converter of FIG. 3; and 

FIGS. 5 and 6 are block diagrams illustrating the 
conversion of parts of two ODA documents to corre- 
sponding CDA documents by the converter of FIGS. 3 
and 4. 

DESCRIPTION OF THE PREFERRED 
EMBODIMENT 

The present conversion system separates the conver- 
sion of the content types (architectures) within the com- 
pound documents from the conversion of the logical 
and layout structures within those documents. It is 
made up of a number of separate conversion compo- 
nents, including a main conversion component, profile 
conversion components, and content architecture con- 
version components. The profile conversion compo- 
nents and content conversion components are identified 
by names corresponding to the profile or content archi- 
tecture names. The system provides the addition of 
extra profile conversion components and content con- 
version components as new profiles and content format 
conversion components are developed. These additions 
may be effected without the need for any changes to be 
made to the other (existing) conversion components of 
the system. 

The main conversion component performs the func- 
tion of identifying the profile to which the compound 
document conforms and the selection and invocation of 
the profile conversion component. (If a compound doc- 
ument specifies a profile and a profile conversion com- 
ponent for that profile does not exist in the system, and 
error will be reported to the calling system and the 
conversion will be aborted.) The profile conversion 
component will then proceed with the processing and 
conversion of the logical and layout structures within 
the document being converted. When content is located 
within a compound document that is being converted, 
the content architecture to which the content belongs is 
identified and the corresponding content architecture 
conversion component is invoked to perform the con- 
version. 
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It is possible for content of different types to be inter- processing and desk top publishing (with complex lay- 
mixed within a compound document. The present sys- outs, fonts and printing, diagram, &c); and level 4 is full 
tern provides a system of callbacks to handle this situa- ODA/ODIF. Three specific DAPs are known as QUI, 
tion. When content of a first type is found to have con- Q112, and Q121, 

tent of a second type embedded in it, the conversion 5 There is therefore a plurality of structure handlers 12, 
component for the first type will be unable to convert which are also termed DAP handlers. An initial deter- 
the content of the second type. It will therefore perform mination is made as to which of the DAPs the docu- 
a callback to the profile conversion component. In re- ment is to conform with, and selects the appropriate 
sponse, the profile conversion component invokes the DAP handler accordingly. This results in greater effi- 
content conversion component for the. second type of 10 ciency if the DAP level is low. 
content, and this content conversion component then As discussed above, there are various types or meta- 
performs the necessary conversion and, when this con- classes of contents; such types may, for example, corn- 
version is completed, returns to the profile conversion prise text, raster graphics, and geometric graphics, 
component. The profile conversion component recog- There is a plurality of contents handlers 13 (also termed 
nizes that callback processing was being performed and 15 contents architecture handlers), one for each type, to 
therefore returns processing to the first content type's perform the conversion of contents of corresponding 
conversion component. types. 

FIG. 3 is a block diagram of the conversion system It will be realized that the system can be expanded by 

for converting a DDIF document to ODIF form. The the addition of further DAP Converters as further 

DDIF document may be taken as being stored in a 20 DAPs are developed in the future, and also by the addi- 

CDA converter kernel unit 15, which is coupled to an tion of further Contents Architecture Handlers as the 

ODA FEBE (front-end back-end) Main Component 14. possible contents metaclasses are expanded in the fu- 

A set of structure handlers 12 is coupled to, the FEBE ture. 

unit 14, and a set of contents handlers 13 is coupled to To perform the conversion of the CDA document to 

the structure handlers 12. These units 12 and 13 to- 25 a ODA form, its structure (primarily its segmentation) 

gether perform the essential aspects of generating the has to be analyzed and the various contents components 

required ODIF document, which is passed to an ODA of it converted, and the appropriate objects generated 

toolkit unit 11, which is in turn coupled to an ODA of the ODA document which is being produced. The 

document unit 10. selected DAP handler and the Contents Architecture 

The principles of operation of the two components 14 30 handlers cooperate in analyzing the structure of the 

and 15 are described in U.S. patent application Ser. No. CDA document; the various Contents Architecture 

07/368,7 16, filed Jun. 19, 1989, entitled SYSTEM AND handlers perform the conversion of the various contents 

METHOD FOR CONVERTING BETWEEN A components of the CDA document; and the selected 

SOURCE STRUCTURE AND A TARGET STRUC- DAP handler generates the objects of the ODA docu- 

TURE IN A DIGITAL DATA PROCESSING SYS- 35 ment. 

TEM, with inventors, Martin L. Jack, et al., not shown FIG. 4 is a more detailed block diagram of a typical 

here. It may however be noted that the CDA Converter DAP handler and associated Content Architecture han- 

Kernel 15 has coupled to it a CDA Toolkit unit (analo- dlers. When a document is to be converted, one of the 

gous to the ODA Toolkit unit 11), a DDIF Document set of DAP handlers is selected. A single DAP handler 

Unit, and possibly a DTIF (Document Table Inter- 40 12-1 is therefore shown. It is assumed that it is a DAP 

change Format) Document Unit; the details of the which permits three types of contents, text TEXT, 

DDIF and DTIF units and/or the structure of the data geometric graphics GG, and raster graphics RG. Three 

stored in them is described in the two further U.S. appli- respective Contents Architecture handlers 13-1 to 13-3 

cations filed simultaneously, Ser. No. 07/368,697 enti- are therefore shown. 

tied TABULAR DATA FORMAT, filed on Jun. 19, 45 In the process of converting a CDA document to an 

1989 by inventors Carol A. Young and Neal F. Jacob- ODA document, the various objects and blocks of the 

son and, Ser. No. 07/368,703 entitled DATA STRUC- ODA document have to be constructed. Each object 

TURE INCLUDING EXTERNAL REFERENCE has a structure including its identification and links with 

ARRANGEMENT, filed on Jun. 19, 1989 by Robert L. other objects, and its contents block (if any). In the 

Travis et al. 50 conversion process, it is often necessary to maintain 

The toolkits operate to analyze the associated docu- several objects simultaneously in the process of con- 

ments (identifying the various segments and other fea- struction. The DAP handler 12 therefore includes an 

tures of the documents), to extract from the documents object store 20, in which objects in the process of being 

the various components of that document, and to gener- constructed can be held. This store is a stack, since in 

.ally manipulate the documents. The documents are 55 fact once the construction of a new object has begun, no 

generally stored in forms suitable for reasonably effi- changes need be made to earlier objects still under con- 

cient storage, and the toolkits broadly provide and in- struction until the construction of the latest object has 

terface which presents the organizational and structural been completed. 

aspects of the documents. It is convenient for present descriptive purposes to 

The ODA/ODIF standard is, of course, very elabo- 60 assume that the objects are completely assembled in the 

rate, and many of its users do not require the full range store 20. However, it will be realized that their assem- 

of ODA/ODIF facilities. Certain subsets of the full bly can in fact be completed by the units 14 and 15; 

ODA/ODIF standard have therefore been developed; these may contain a toolkit, corresponding to the toolkit 

these are known as DAPs (Document Application Pro- of unit 11, for that purpose. If an object has contents, 

files). These subsets can be loosely classified into levels. 65 those contents will be generated by one of the contents 

Level 1 is simple text manipulation; level 2 is simple handlers 13. 

word processing (with page numbering, and a fairly The DAP handler 13 contains a structure conversion 

elaborate logical structure); level 3 is complex word component 21 which analyzes the structure of the in- 
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coming CDA document and initiates the generation of version process, with more clocks being created by the 

the appropriate ODA compound logical objects. It also callback. If not, the contents handler returns control 

contains a plurality of call-back processing units TEXT back to the content handler which called it or, if it was 

22-1, GG 22-2, RG 22-3, and FN 22-4, for processing not called by a content handler, to the structure conver- 

text, geometric graphics, raster graphics, and footnotes 5 sion component 21. 

respectively. The call-back units are called by the con- To take a specific example, in CDA/DDIF a text 
tent handlers and create the appropriate logical objects segment can include geometric graphics and/or raster 
needed to accommodate the changed type of content, (image) graphics contents. (A geometric graphics seg- 
passing the objects back down to the appropriate con- ment can similarly contain image contents.) An 
tents handlers for the conversion of the informational 10 ODA/ODIF BLO (basic logical object (or basic layout 
contents (text, graphics, etc.). object)) can however contain only a single type of con- 
Some segments in a CDA document contain only tent, 
content of a single type. The conversion process for When the DAP handler 12 encounters a text segment, 
such a segment is relatively simple. The structure con- it passes the text content to the text content handler 
version component 21 identifies the type of the contents 15 13-1, which starts to read the content. A successive text 
and passes the segment to the appropriate content han- elements of the segment are encountered, they are con- 
dler 13. The selected content handler 13 converts the verted by the text handler 13-1. To store this converted 
content of the segment passed to it. Once the content content, the text content handler 13-1 calls the text 
conversion is complete, the content handler uses the callback 22-1 to request a BLO in which the converted 
callback mechanism to identify where the content 20 content can be stored. The callback unit 22-1 creates a 
should be stored. If for example the content is text con- BLO and stores it to the most recent CLO on the stack 
tent, the content handler 13-1 calls back to the text 20, and then passes the new BLO back to the text con- 
callback unit 22-1 in the DAP handler. The text call- tent handler 13-1. The text content handler 13-1 then 
back unit 22-1 then generates a basic logical object stores the converted text content to the new BLO. 
(BLO) of the text type, and stores it to the most recent 25 When the text content handler encounters a geomet- 
(compound logical object) CLO on the stack 20. The ric graphics segment stored within the text segment, the 
text callback unit 22-1 then returns this new BLO to the text contents handler cannot deal with it. The text con- 
text content handler 13-1, which then stores the con- tents handler stores any text already converted by re- 
verted content to the new BLO. questing a BLO from callback unit 22-1 and storing the 
As noted above, the blocks themselves are assembled 30 content to it, and then calls the GG content handler 
in the store 20. However, the blocks are identified by 13-2 and passes the geometric graphics segment to it. 
pointers to them from the content handlers, callback The GG content handler 13-2 then starts to convert the 
units, and stack 20. Thus "generating a BLO and storing geometric graphics content. When the GG content 
it to a CLO" involves creating the BLO and entering, handler 13-2 has completed conversion of the content, it 
into the CLO, links to the newly created BLO; and 35 uses the callback mechanism to call back to (pass con- 
storing content to a BLO" involves locating the BLO trol to) the GG callback unit 22-2. The callback unit 
and entering the content into it. 22-2 creates a BLO for geometric graphics content and 
The processing of a sequence of elements of the same stores it to the most recent CLO on the stack 20, and 
type in a single segment is similar. What happens here is passes the new BLO to the GG content handler 13-2, 
that as successive elements of the text segment are en- 40 which then stores the converted content to the new 
countered, so their contents are convened by the text BLO. As the GG content handler 13-2 has now com- 
content handler 13-1, and blocks are created by the text pleted the content conversion of the geometrical graph- 
callback unit 22-1 to contain the converted text ele- ics segment, it then returns control to the text content 
ments. handler 13-1 which originally called it. The text content 
It is thus evident that in the present system as so far 45 handler 13-1 then resumes process of the remaining text 
described, the processing of the structure or layout of content in the text segment. 

the document is strictly separated from the processing This process is illustrated by FIG. 5. The left-hand 
of its contents. The processing of the structure or layout part of this figure show a portion of a CDA document, 
is performed exclusively by the selected DAP handler, consisting of a flow segment FLOW containing a para- 
with the various blocks being created by the structure 50 graph segment PI which in turn contains a text contents 
converter and the various callback units. The process- portion txtl, a nested segment containing a geometric 
ing of the contents is performed exclusively by the graphics content portion GeoG, and a further text con- 
contents architecture handlers, and the processing of tents portion txt2. The corresponding ODA document 
each type of contents is performed exclusively by the is shown on the right, consisting of a passage CLO 
content architecture handler for that type. These princi- 55 (compound logical object) PASSAGE containing a 
pies are maintained with the more complicated situa- paragraph CLO PARA which in turn contains three 
tions described below. BLOs TEXT, GEOM, and TEXT, containing the text 
In the situations discussed so far, there has been an contents portion txtl, the geometric graphics content 
exact correspondence between the two systems portion GeoG, and the text contents portion txt2 re- 
(ODA/ODIF and CDA/DDIF). However, there are 60 spectively. (It is convenient to use the same terms for 
situations in which this correspondence breaks down, the contents portions in both CDA and ODA formats.) 
and the conversion system has to cope with these situa- The PASSAGE and PARA CLOs in the figure are 
tions. This is achieved by repeated transfers of control structure elements which are created by the structure 
between the call-back units and the content handlers. conversion component for the DAP handler. The 
When the end of a particular type of content is reached, 65 TEXT and GEOM BLOs in the figure are created by 
the content handler unit returns control to the unit the text callback unit 22-1 and the GG callback unit 22-2 
which called it. If there is more of the original type of at the request of the text content handler and the GG 
contents, the contents handler then continues the con- content handler respectively. The various content- 
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s — txtl, GeoG, and txt2— are stored to the BLOs by the BLO being completed) and returned to the text content 

Content Handlers. This process is described more fully handler 13*1. 

in the following paragraphs. The operation of the raster graphics (RG) call-back 

In the conversion process, the CDA document is unit 22-2 is similar. If any other component— the struc- 

initially processed by the structure conversion compo- 5 ture conversion component, the text content handler or 

nent 21. This first encounters the segment FLOW, and the GG content handler— identified a segment with 

creates the corresponding object PASSAGE. Since the raster graphics, then the RG contents handler is called 

PASSAGE object will have a substructure, it is a CLO, and the segment with raster content is passed to it The 

and it is therefore entered in the stack 20 to await the content handler 13-3 converts the content and calls 

creation of the dependent objects. The structure con- 10 the RG callback unit 22-3, which creates a raster type 

version component 21 next encounters the segment PI, BLO and stores it to the ODA document. The RG 

and creates the corresponding object PARA, which is callback unit 22-3 returns the new BLO to the RG 

also a CLO and entered in the stack 20. content handler 13-3, which stores the converted con- 

The structure conversion component 21 next encoun- ten | *° new BL 0 and then returns to the component 

ters the first content element of the segment PI, and 15 which *• 

finds that it is of text type. It therefore calls the text ^ operation of the footnote call-back unit 22-4 is 

contents handler 13-1 and passes the segment to it. The somewhat more complicated. A footnote is actually a 

text content handler identifies the first content element, structure occurring within text content, so a callback is 

txtl, and converts it. It then identifies the next content provided to allow the DAP handler the control of the 

element of the segment as being a segment, SEG1, with 20 structure processmg. 

geometric graphic content. The text content handler FIG - 6 dlustr / te f how footnotes are handled. A foot- 

13-1 therefore has to relinquish control. note consists of a footnote text txta which is the actual 

Before doing so, it calls the text callback unit 22-1, footnote body, a f<>otnote reference text txtb which 

which creates a text type BLO (TEXT) and stores it to „ w * e body of the mam text, and a footnote 

the most recent CLO (the PARA) on the stack 20. The 25 ^entifier text txtc which is the wrre^ndmg reference 

text callback unit 22-1 then returns the newly created W !?! ch spears against the footnote body. The footnote 

BLO to the text content handler 13-1. The text content S "ST^? ? ^"Jftf^ T SfSTE f ^ 

handler 13-1 then stores the converted text, txtl, to the "™ £^ 

T*r r\ *rt- , . ,. _ ^ . contents portions txta (which appears in the main body 

new BLO. Then the ext content handler 13-1 calls the 3Q of ^ * £ ^ M an(J ^ * 

GG content handler 13-2 and passes SEG1 to that con- mb m identjca , Further> ^ fMj J l m 

ei !U_ *W~ T * tL , • . computed contents rather than explicit contents; in 

The GG content handler 13-2 now has control, and Qther ^ the va]ues of thgse ^ iong wil] be 

starts reading the content of SEGl. It identifies the ^ uted by the s stem> rather like * numbe rs, 

content GeoG and converts it When it reaches the end 35 rather than the actual values bei entered ex licitly . 

°1 t ^ 0n «w t °I S ^°i i he ^ G ^ nte !lL ha l ndle . r 2"! In a CDA document, the footnote structure itself 

the GG callback unit 22-2 ^e GG callback unit 22-2 consists of a footnote reference m fnlabel 

creates a geometric type BLO (GEOM) and stores it to with the footnote reference text txta, and a footnote 

th f^°r re< T e !!! i < r L 1 P <^ ARA > ^ the St tf"° GG *g«nt FN with a footnote identifier subsegment 

callback unit 22-2 then returns the new BLO to the GG 40 FNID ^ text txtb ^ a footnote text subse g m ent 

content handler 13-2. The GG content handler 13-2 PARA with the footnote text txtc. In the corresponding 

then stores the converted content GeoG to the new ODA document, the footnote appears as an object ree 

BLO. Finally, it returns control to the component G f a CL0 FOOTNOTE with a BLO FNRF with text 

which called it— the text content handler 13-1. txta attached and a further CLO FNBODY which in 

The text content handler has now regained control, 45 turn has two BLOs, FNNO with footnote identifier text 

and proceeds to continue reading the content of PI. As t xtb attached and FNTEXT with footnote body text 

a result, it finds the next element, txt2, and identities it as t xtc attached. 

being a further text element. The text content handler when the text contents handler 13-1 encounters the 

13-1 converts txt2 and then calls the text callback unit footnote, i.e. The FNLABLE plus FN combination, it 

22-1 as before. The text callback, unit 22-1 creates an- 50 calls the footnote call-back unit 22-4 and passes the 

other text type BLO (TEXT) and stores it to the PARA footnote to it. The callback unit 22-4 then creates a 

CLO on the stack 20. It then returns the new BLO to FOOTNOTE CLO and stores it on the stack 20. The 

the text content handler 13-1, which stores the con- unit 22-4 then creates an FNREF BLO and stores it to 

verted content txt2 to the new BLO. the FOOTNOTE CLO. Since the content of the foot- 

The text content handler 22-1 then recognizes that it 55 note reference is computed content, the appropriate 

has reached the end of the content of PI, so it returns to attributes for computing the content are defined on the 

the structure conversion component 21 which origi- FNREF BLO. Because the text contents portion txta is 

nally called it. The structure conversion component 21 computed contents, it does not need to be converted, 

then recognizes that it has completed the processing of The callback unit 22-4 next creates the footnote body 

the PI segments, so it removes the PARA CLO from 60 CLO FNBODY, and stores it to the stack 20. It next 

the stack and stores it to the next (now most recent) creates the footnote number BLO FNNO, and stores it 

CLO (the FLOW) on the stack 20, and continues to the to the FNBODY CLO; the text contents portion txtb 

next segment stored under FLOW. does not need to be converted as it is computed contents 

Obviously if the segment PI had contained only txtl like txta. . 
and SEGl with GeoG, then the text content handler 65 The footnote call-back unit 22-4 then identifies the 
13-1 would have returned to the structure conversion content of the PARA segment as being text, and there- 
component 21 immediately after the GG content han- fore calls the text content handler 13-1 to convert eh 
dler 13-2 had completed processing (with the GEOM content. This call is recursive; that is it is a hew call to 
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the text content handler, form which the return will be In practice, however, the kinds and depth of recur- 
to the footnote callback unit It is not a return to the text sion are likely to limited. Further, the DAP handler 
content handler to the footnote callback unit. may impose restrictions on the kinds of depth of recur- 

The text content handler then processes the content sion which are allowed. For example, the DAP handler 
of the PARA segment by its normal operation, and the 5 may not allow graphics to occur within footnotes. One 
content txtc is converted. The text content handler then way in which the DAP handler can achieve this is by 
calls the text callback unit 22-1, which generates a BLO limiting the extent to which the contents handlers can 
FNTEXT and stores it to the CLO FNBODY on the call each other. Thus the text contents handler 13-1 can 
stack 20. The BLO FNTEXT is then returned by the call either of the graphics handlers 13-2 and 13-3, but 
text callback unit to the text content handler 13-1, 10 the raster graphics handler 13-3 cannot call the text 
which stores the converted txtc to the new BLO, and handler 13-1, because in CD A, text content will not 
returns control to the footnote callback unit 22-4 (thus occur within raster graphics image content, 
ending the recursive call to the text content handler). It was noted above that there are different DAP 

At this point the footnote processing is complete, so handlers for the different DAPs. Such different DAP 
the footnote callback unit 22-4 removes the CLO 15 handlers may have different sets of contents handlers 
FNBODY from the stack 20 and store it to the CLO associated with them. Different DAPs may similarly 
FOOTNOTE on the stack 20, which has just been un- have different sets of call-back units, and different con- 
covered by the removal of the CLO FNBODY. Then straints and/or constraint mechanisms on them, in their 
the footnote callback unit removes the FOOTNOTE corresponding DAP handlers. 

CLO from the stack 20 and stores it to the previous 20 The above discussion has been in terms of the con ver- 
CLO (a PARA) on the stack 20. Finally the footnote sion of the logical objects, i.e. of the informational con- 
callback unit returns control to the text content handler, tents of the documents. The conversion of the layout 
which then continues to process txt2 in the usual way. information follows the same principles. 

Considering more generally the relationships be- We claim: 
tween the contents handlers and the callback units, 25 1. A data structure format conversion system com- 
these are determined by the DAP handler. Each con- prising: 

tent handler is provided with a particular permitted set a plurality of profile conversion components for con- 
of callbacks by the DAP handler, and when a content verting the logical and/or layout structures of the 

handler is required to call another content handler, it source document to those of the target format; 

will in turn pass on the same set of callbacks which it 30 a plurality of content architecture conversion compo- 
received form the DAP handler. But each content han- nents selected by the profile conversion compo- 

dler can only call its own type of callback units - thus nents for converting the contents of the source 

the text contents handler 13-1 can only call the text and document to the target format and being invoked 

footnote callback units 22-1 and 22-4, the geometric by the required content architecture conversion 

graphics content handler 13-2 can only call the geomet- 35 component; and 

ric graphics callback unit 22-2, etc. a main converter component for identifying the pro- 

However, the permitted set of callbacks may be file to which the compound document conforms 

changed under different circumstances, as is appropri- and invoking the required profile conversion com- 

ate to the current position in the document. For exam- ponent. 

pie, the type of text BLO required when creating a 40 2. A system according to claim 1 wherein the content 
document header may be different form the type re- architecture conversion components include a text con- 
quired when creating the body of a document. The text version component, a geometrical graphics conversion 
callback unit 22-1 will thus have two slightly different component, and a raster graphics (bit image) conversion 
functionalities depending on what part of the document component. 

is involved. The DAP handler (either the structure 45 3. A system according to claim wherein the profile 
conversion component or a footnote callback unit - conversion components include components imple- 
which is a component of the DAP handier like the menting ODA QUI, Q112, and Q121 standards, 
structure conversion component) knows which set of 4. A system according to claim 1 wherein each profile 
callbacks is appropriate and passes them to the content conversion component comprises means for generating 
handlers. 50 logical objects of the target document architecture, 

A footnote is also generally text. However, the func- comprising a structure converter unit for generating 
tionalities required for a callback in the event of a foot- compound logical objects of the target document and a 
note being encountered involve the generation of the plurality of callback units, each for generating a diiTer- 
various CLOs and BLOs shown and discussed with ent type of basic logical object or set of logical objects 
reference to FIG. 6. These functionalities are so differ- 55 including at least one basic logical object; and each 
ent from those required in the generation of ordinary content architecture conversion component converts a 
textual segments that it is convenient to regard the foot- different one of the possible data types of the content of 
note callback as being performed by a footnote callback the source document to the target format, each being 
unit, 22-4, which is distinct form the ordinary text call- callable from the others and from the units of the struc- 
back unit 22-1. 60 ture architecture conversion component, and each ca- 

As noted above, the processing of a footnote involves "pable of calling back to any one of a corresponding set 
a recursive call to the text handler 13-1. In principle, of the callback units. 

recursion can occur quite generally. For example, if 5. A system according to claim 2 wherein each profile 
geometric graphics information is encountered within conversion component comprises means for generating 
text, text can be encountered within the geometric 65 logical objects of the target document architecture, 
graphics (e.g. as labels or legends). This will again in- comprising a structure converter unit for generating 
volve recursion, with a return from the text inside the compound logical objects of the target document and a 
graphics back to the graphics ending the recursion. plurality of callback units, each for generating a differ- 
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ent type of basic logical object or set of logical objects 
including at least one basic logical object; and each 
content architecture conversion component converts a 
different one of the possible data types of the content of 
the source document to the target format, each being 5 
callable from the others and from the units of the struc- 
ture architecture conversion component, and each ca- 
pable of calling back to any one of a corresponding set 
of the callback units. 

6. A system according to claim 3 wherein the call- 10 
back units include a text callback unit, a graphics call- 
back unit, and a footnote callback unit. 

7. A data structure format conversion system com- 
prising; 

a structure architecture conversion component for IS 
generating logical objects of the target document 
architecture, comprising a structure converter unit 
for generating compound logical "objects of the 
target document and a plurality of callback units, 
each for generating a different type of basic logical 20 
object or set of logical objects including at least 
one basic logical object; 

a plurality of content architecture conversion compo- 
nents each for converting a different one of the 
possible data types of the content of the source 25 
document to the target format, each being callable 
from the others and from the units of the structure 
architecture conversion component, and each ca- 



pable of calling back to any one of a corresponding 
set of the callback units. 

8. A system according to claim 7 wherein the content 
architecture conversion components include a text con- 
version component, a geometrical graphics conversion 
component, and a raster graphics (bit image) conversion 
component. 

9. A system according to claim 7 wherein the call- 
back units include a text callback unit, a graphics call- 
back unit, and a footnote callback unit. 

10. A system according to claim 8 wherein the call- 
back units include a text callback unit, a graphics call- 
back unit, and a footnote callback unit. 

11. A system according to claim 7 wherein there is a 
plurality of structure architecture conversion compo- 
nents, any one of which is selectable. 

12. A system according to claim 8 wherein there is a 
plurality of structure architecture conversion compo- 
nents, any one of which is selectable. 

13. A system according to claim 10 wherein there is a 
plurality of structure architecture conversion compo- 
nents, any one of which is selectable. 

14. A system according to claim 7 wherein the struc- 
ture architecture conversion components include com- 
ponents implementing ODA QUI, Q112, and Q121 
standards. 
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